300 research outputs found

    Opening the Black Box of wav2vec Feature Encoder

    Full text link
    Self-supervised models, namely, wav2vec and its variants, have shown promising results in various downstream tasks in the speech domain. However, their inner workings are poorly understood, calling for in-depth analyses on what the model learns. In this paper, we concentrate on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units. To analyze the embedding space in a reductive manner, we feed the synthesized audio signals, which is the summation of simple sine waves. Through extensive experiments, we conclude that various information is embedded inside the feature encoder representations: (1) fundamental frequency, (2) formants, and (3) amplitude, packed with (4) sufficient temporal detail. Further, the information incorporated inside the latent representations is analogous to spectrograms but with a fundamental difference: latent representations construct a metric space so that closer representations imply acoustic similarity

    Parameterized Complexity Results for General Factors in Bipartite Graphs with an Application to Constraint Programming

    Full text link
    The NP-hard general factor problem asks, given a graph and for each vertex a list of integers, whether the graph has a spanning subgraph where each vertex has a degree that belongs to its assigned list. The problem remains NP-hard even if the given graph is bipartite with partition U+V, and each vertex in U is assigned the list {1}; this subproblem appears in the context of constraint programming as the consistency problem for the extended global cardinality constraint. We show that this subproblem is fixed-parameter tractable when parameterized by the size of the second partite set V. More generally, we show that the general factor problem for bipartite graphs, parameterized by |V|, is fixed-parameter tractable as long as all vertices in U are assigned lists of length 1, but becomes W[1]-hard if vertices in U are assigned lists of length at most 2. We establish fixed-parameter tractability by reducing the problem instance to a bounded number of acyclic instances, each of which can be solved in polynomial time by dynamic programming.Comment: Full version of a paper that appeared in preliminary form in the proceedings of IPEC'1

    Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

    Full text link
    This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.Comment: Accepted to Interspeech 202

    Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning

    Full text link
    Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity level classification and an auxilary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted features such as eGeMaps and linguistic features, and SVM, MLP, and XGBoost classifiers. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 4.79% for classification accuracy. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.09% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect

    Impact of Sexual Attitude and Marital Intimacy on Sexual Satisfaction in Pregnant Couples: An Application of the Actor-Partner Interdependence Model

    Get PDF
    PURPOSE: The purpose of this study was to investigate actor and partner effects of sexual attitude and marital intimacy on sexual satisfaction in pregnant couples. METHODS: Data were collected from 176 pairs of the pregnant couples visiting for prenatal care at hospitals from June 18 to September 24, 2016. The collected data were analyzed by paired t-test and Pearson's correlation coefficients using SPSS 18.0 and interdependent effect (Actor-Partner Interdependence Model analysis) through AMOS 18.0. RESULTS: The sexual attitude and marital intimacy of the pregnant woman did not have a partner effect on the sexual satisfaction of her husband, respectively (β=.12, p=.141), (β=.01, p=.938). The sexual attitude of the husband had a partner effect on the sexual satisfaction of the pregnant woman (β=.13, p=.021), but the marital intimacy of the husband did not show a partner effect (β=.07, p=.202). CONCLUSION: Study suggests that the sexual attitude and marital intimacy of pregnant couples should be considered as factors when developing an intervention to improve sexual satisfaction in couples. Moreover, pregnant couples should participate in intervention together because the sexual satisfaction has conceptual view of interdependence in two-person relationships

    The change of QRS duration after pulmonary valve replacement in patients with repaired tetralogy of Fallot and pulmonary regurgitation

    Get PDF
    Purpose This study aimed to analyze changes in QRS duration and cardiothoracic ratio (CTR) following pulmonary valve replacement (PVR) in patients with tetralogy of Fallot (TOF). Methods Children and adolescents who had previously undergone total repair for TOF (n=67; median age, 16 years) who required elective PVR for pulmonary regurgitation and/or right ventricular out tract obstruction were included in this study. The QRS duration and CTR were measured pre- and postoperatively and postoperative changes were evaluated. Results Following PVR, the CTR significantly decreased (pre-PVR 57.2%±6.2%, post-PVR 53.8%±5.5%, P=0.002). The postoperative QRS duration showed a tendency to decrease (pre-PVR 162.7±26.4 msec, post-PVR 156.4±24.4 msec, P=0.124). QRS duration was greater than 180 msec in 6 patients prior to PVR. Of these, 5 patients showed a decrease in QRS duration following PVR; QRS duration was less than 180 msec in 2 patients, and QRS duration remained greater than 180 msec in 3 patients, including 2 patients with diffuse postoperative right ventricular outflow tract hypokinesis. Six patients had coexisting arrhythmias before PVR; 2 patients, atrial tachycardia; 3 patients, premature ventricular contraction; and 1 patient, premature atrial contraction. None of the patients presented with arrhythmia following PVR. Conclusion The CTR and QRS duration reduced following PVR. However, QRS duration may not decrease below 180 msec after PVR, particularly in patients with right ventricular outflow tract hypokinesis. The CTR and ECG may provide additional clinical information on changes in right ventricular volume and/or pressure in these patients

    Algorithm for Finding kk-Vertex Out-trees and its Application to kk-Internal Out-branching Problem

    Full text link
    An out-tree TT is an oriented tree with only one vertex of in-degree zero. A vertex xx of TT is internal if its out-degree is positive. We design randomized and deterministic algorithms for deciding whether an input digraph contains a given out-tree with kk vertices. The algorithms are of runtime O(5.704k)O^*(5.704^k) and O(5.704k(1+o(1)))O^*(5.704^{k(1+o(1))}), respectively. We apply the deterministic algorithm to obtain a deterministic algorithm of runtime O(ck)O^*(c^k), where cc is a constant, for deciding whether an input digraph contains a spanning out-tree with at least kk internal vertices. This answers in affirmative a question of Gutin, Razgon and Kim (Proc. AAIM'08)
    corecore